Sampling Precision to Depth 9000: Evaluation Experiments at NTCIR-6

نویسنده

  • Stephen Tomlinson
چکیده

We describe evaluation experiments conducted by submitting retrieval runs for the Chinese, Japanese and Korean Single Language Information Retrieval subtasks of the Cross-Lingual Information Retrieval (CLIR) Task of the 6th NII Test Collection for IR Systems Workshop (NTCIR-6). We show that a Generalized Success@10 measure exposes a downside of the blind feedback technique that is overlooked by traditional ad hoc retrieval measures such as mean average precision, R-precision and Precision@10. Hence an important retrieval scenario, seeking just one item to answer a question, is not properly evaluated by the traditional ad hoc retrieval measures. Also, for each language, we submitted a one-percent subset of the first 9000 retrieved items to investigate the frequency of relevant items at deeper ranks than the official judging depth of 100. The results suggest that, on average, less than 60% of the relevant items for Chinese and less than 80% for Japanese are assessed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toshiba BRIDJE at NTCIR-6 CLIR: The Head/Lead Method and Graded Relevance Feedback

At NTCIR-6 CLIR, Toshiba participated in the Monolingual and Bilingual IR tasks covering three topic languages (Japanese, English and Chinese) and one document language (Japanese). For Stage 1 (which is the usual ad hoc task using the new NTCIR6 topics), we submitted two DESCRIPTION runs and two TITLE runs for each topic language. Our first search strategy is Selective Sampling with Memory Rese...

متن کامل

Toshiba BRIDJE at NTCIR-6 CLIR

At NTCIR-6 CLIR, Toshiba participated in the Monolingual and Bilingual IR tasks covering three topic languages (Japanese, English and Chinese) and one document language (Japanese). For Stage 1 (which is the usual ad hoc task using the new NTCIR6 topics), we submitted two DESCRIPTION runs and two TITLE runs for each topic language. Our first search strategy is Selective Sampling with Memory Rese...

متن کامل

Ranking the NTCIR Systems Based on Multigrade Relevance

At NTCIR-4, new retrieval effectiveness metrics called Q-measure and R-measure were proposed for evaluation based on multigrade relevance. This paper shows that Q-measure inherits both the reliability of noninterpolated Average Precision and the multigrade relevance capability of Average Weighted Precision through a theoretical analysis, and then verify the above claim through experiments by ac...

متن کامل

A Proposal to Extend and Enrich the Scientific Data Curation of Evaluation Campaigns

◮Using the experimental data, we produce different performance measurements, such as precision and recall, that are standard measures that are used to evaluate the performances of an Information Retrieval System (IRS) for a given experiment. Starting from these performance measurements, we can compute descriptive statistics, such as mean or median, used to summarize the overall performances ach...

متن کامل

NTCIR-6 CLIR Experiments at Osaka Kyoiku University - Term Expansion Using Online Dictionaries and Weighting Score by Term Variety

This paper describes experimental results of J-J subtask of NTCIR-6 CLIR. We expanded query term using online dictionaries in a WEB. It was effective for some topics of which average precision was low. Probabilistic model were employed for scoring, and we modified this score multiplying by the number of varieties of query terms, also. In most cases this works well. Query term reduction should b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007